Sparse Interpretible Audio Model
Table of Contents
Model Architecture
This small model attempts to decompose audio featuring acoustic instruments into the
following components:
- A small (16-dimensional) global context vector
- Some maximum number of small (16-dimensional) event vectors, representing individual audio events
- Times at which each event occurs
While global context and local event data are encoded as real-valued vectors and not discrete values, the
representation learned still lends itself to a sparse, interpretible, and hopefully easy-to-manipulate encoding.
Each sound sample below includes the following elements:
- The original recording
- The model's reconstruction
- New audio using the original timing and context vector, but random event vectors
- New audio using the original event and context vectors, but with random timings
- New audio using the original timing and event vectors, but with a random global context vector
Cite this Work
@misc{vinyard2023audio,
author = {Vinyard, John},
title = {Sparse Interpetable Audio},
url = {https://JohnVinyard.github.io/machine-learning/2023/11/15/sparse-physical-model.html},
year = 2024
}